Human alterations of the global floodplains 1992–2019

Floodplains provide critical ecosystem services; however, loss of natural floodplain functions caused by human alterations increase flood risks and lead to massive loss of life and property. Despite recent calls for improved floodplain protection and management, a comprehensive, global-scale assessment quantifying human floodplain alterations does not exist. We developed the first publicly available global dataset that quantifies human alterations in 15 million km2 floodplains along 520 major river basins during the recent 27 years (1992–2019) at 250-m resolution. To maximize the reuse of our dataset and advance the open science of human floodplain alteration, we developed three web-based programming tools supported with tutorials and step-by-step audiovisual instructions. Our data reveal a significant loss of natural floodplains worldwide with 460,000 km2 of new agricultural and 140,000 km2 of new developed areas between 1992 and 2019. This dataset offers critical new insights into how floodplains are being destroyed, which will help decision-makers to reinforce strategies to conserve and restore floodplain functions and habitat.

Methods input data sources. We quantified the human alterations of global floodplains from three input data sources: (1) the GFPLAIN250m global floodplain extent dataset 33  Floodplains and non-floodplains definition. Floodplains were defined using GFPLAIN250m dataset. GFPLAIN250m is based on a geomorphic analysis of the Digital Terrain Model (DTM). Its underlying algorithm distinguishes floodplains from surrounding hillslopes as landscape features that have been naturally shaped by accumulated geomorphic effects of past flood events [38][39][40][41][42] . Therefore, this floodplain dataset does not indicate a specific magnitude or return period of flooding (e.g., 100-year floodplains 43,44 ). GFPLAIN250m is efficient in mapping floodplains where water-driven erosion and depositional processes are predominant features but have limited efficiency in deserts and ice-covered regions (hence, places like northern Africa, Persian Gulf, Tibetan plateau, and the region above 60 degrees north latitude are not covered by this dataset). The dataset is available in 250-m spatial resolution gridded GeoTIFF format. Areas beyond those identified as floodplains were classified as non-floodplains.
While many sophisticated algorithms, models, and datasets are available for floodplain delineation [43][44][45] , we selected the GFPLAIN250m dataset considering four factors: spatial resolution, global coverage, well-established literature pool demonstrating the accuracy of the dataset, and importantly a robust algorithm based on the hydrogeomorphological aspects of floodplain formation (not specific to 100-year floods). The GFPLAIN algorithm is openly accessible and has been validated in numerous published research including our previous floodplain alteration study on the Mississippi River Basin 31,46-50 .
Land use. The CCI land use dataset is based on the GlobCover unsupervised classification chain 51 framework. The framework generated global annual land use maps from 1992 to 2019 by using a multi-year and multi-sensor strategy 52 36 . The dataset defines 37 land use classes following the United Nations Land Cover Classification System (UN LCCS) 53,54 . It is available at 300-m spatial resolution in gridded GeoTIFF and NetCDF format. We used the entire range of currently available CCI land use from 1992 to 2019 in our approach to derive the floodplain alteration dataset.
We considered multiple alternative land use datasets [55][56][57][58] while designing our methodology. Using these datasets as the primary inputs in our analysis posed two major challenges. Many datasets encompass intermittent periods of data availability and lack continuity. For example, the cropland extent and change product in the www.nature.com/scientificdata www.nature.com/scientificdata/ Fig. 1 Human alterations of the global floodplains between 1992 and 2019 across 520 major river basins 37 . Human alteration was defined as changes in floodplain land use (e.g., wetland → agriculture) caused by human disturbances that negatively impact floodplain functions. Plot (a) maps the degree of floodplain alteration as percent of floodplain area (i.e., total area of "negatively impacting" land use change within the floodplain/total floodplain area of the basin × 100), thus allowing a consistent analysis regardless of the differences in basin sizes and floodplain extents therein. Note, the floodplain dataset used in this analysis (GFPLAIN250m 33 ) does not cover deserts and ice-covered regions. Hence, places like northern Africa, Persian Gulf, Tibetan plateau, and the region above 60 degrees north latitude are not shown in plot (a). To identify the characteristic pattern of floodplain alterations, plot (b) shows how different alteration types in floodplain land use varied at every 250-m spatial resolution along the latitude, as well as the pattern of inter-class land use transitions at continental scales. Plot (b) is further supported by Supplementary Fig. 1 showing how the floodplain alterations at different continents contribute to the overall global floodplain alterations. Plot (c) are time-series graphs showing the continuous increase in the area (km 2 ) of altered floodplains at continental scales along the 27 years of analysis (1992-2019). Plot (d) compares relative degree of alterations within the floodplain and the remaining part of the landscape that is outside the floodplain (i.e., non-floodplain) respectively for every continent. All corresponding data are available for download via HydroShare platform 32 57 including only tree canopy cover, short vegetation cover, and bare ground cover) exclude important land classes such as wetlands and grasslands which are predominant in global floodplains.
The CCI land use dataset was the best available resource for our targeted scope because (i) it has global coverage with a high spatial resolution (300-m) which is consistent with the spatial resolution of our floodplain extent dataset (250-m), (ii) it uses a detailed land classification scheme, (iii) CCI's remote sensing-based algorithm as well as the land use product have been validated in numerous published research, including our previous floodplain alteration study on the Mississippi River Basin 31 , and (iv) the dataset is regularly updated to include remotely sensed observations in recent years, thus enabling a pathway to potentially extend our work and develop a continuous floodplain alteration dataset.
River basins. The GRDC Major River Basins of the World 2020 was used to define basin boundaries. The dataset was revised and extended from its 2007 edition incorporating data from HydroSHEDS 59,60 and representing 520 major basins across 977 rivers worldwide 37 . The calculated drainage area per basin ranges from 682 km² up to 6 million km², for the Coatan River Basin and the Amazon River Basin respectively. More than 250 of these 520 basins are transboundary, with more than 90 basins shared by three or more countries. GRDC refers these basins as the "major basins" considering their catchment sizes and hydro-political significance.
We used the Global Human Modification (1990-2017) dataset 61,62 for verification purposes. This reference dataset is discussed in detail in the Technical Validation section.
Approach. In a 3-step approach (Fig. 3), we determined: (1) degree of human alterations in floodplains, (2) characteristic patterns and temporal trends, and (3) relative difference between floodplain and "non-floodplain" landscape alterations. All associated data processing and computation tasks were performed in ArcGIS 10.5 and ENVI 5.1 geospatial platforms. www.nature.com/scientificdata www.nature.com/scientificdata/ Degree of human alterations in floodplains. We defined "human alterations of floodplains" in terms of changes in floodplain land use that negatively impact floodplain functions. For example, loss of floodplain wetlands and corresponding increase in agricultural or developed areas is a human disturbance, imposes negative impacts (reduced flood water and nutrient storage), and is unlikely to return into a pre-disturbance state (agriculture areas becoming wetlands). We refer to human alterations of floodplains simply as floodplain alterations throughout this paper.
Our definition of human alterations of floodplains is supported by various published research on global change and sustainability science which consider land use change as the direct indicator of human activities and their negative consequences [63][64][65] . While these functional consequences are complex and often have unknown degree of cause-and-effect relationships among them, use of land use change data is a parsimonious yet highly efficient way to identify the locations and temporal rates at which these consequences might have occurred within the global floodplains over a long period of time (Figs. 1, 2). Although recent studies provided valuable insights into river-floodplain connectivity/integrity by mapping dam and levee locations and/or using statistical tools such as the Connectivity Status Index (CSI) [66][67][68][69] , there are three major limitations. Specifically, (i) indicators of connectivity like dams and levees suggest potential altered exchange of water, sediment, nutrients, and habitat along the river network and between the river channel and the adjacent floodplain areas, and therefore do not explicitly indicate floodplain alterations. (ii) The existing river-floodplain connectivity data (e.g., levee locations, CSI values) are unevenly distributed across the globe and static, without providing spatially and temporally continuous estimates essential to understand yearly extent and long-term trends of floodplain alterations. (iii) River-floodplain connectivity does not necessarily involve human drivers (e.g., beaver dams) 70,71 . In short, emerging data of river-floodplain connectivity/integrity are better-suited to study river-floodplain ecosystems in general and may not be applied as the sole indicator of floodplain alterations. Therefore, our approach to define human floodplain alterations based on land use change data is justified because it offers a quantitative, comprehensive way to measure how floodplains are being altered across space and time 2 , and serves as a direct indicator of the consequences of human disturbance on floodplain ecosystem functions 3,30 . Use of land use change data also offers a scalable pathway to link our floodplain alteration product with numerous existing land planning and policy decision tools 1,72 .
In our previous work on the Mississippi River Basin 31 , we used total area of floodplain land use change as a measure of floodplain alterations. However, because of large variations in floodplain extents across the world's basins, it is not logical to use the total area of floodplain land use change to compare relative floodplain alterations between two basins. For example, our calculation suggests the same total area of floodplain land use change (945 km 2 ) in the Schelde River Basin in Europe and Limpopo River Basin in Africa, although the two basins have very different sizes (19,000 km 2 and 413,000 km 2 , respectively) and floodplain extents (10,000 km 2 and 38,000 km 2 , respectively). Clearly, the 945 km 2 floodplain land use change in these two basins does not express the same magnitude of impact. www.nature.com/scientificdata www.nature.com/scientificdata/ To enable a consistent comparison of floodplain alterations by normalizing the differences in basins' floodplain extents, we calculated the degree of floodplain alteration as percent floodplain extent using this equation: Here, LU ChangeArea i is the total area of land use change within the floodplain of a basin, FP TotalExtent i is the total area of floodplain extent within that basin, and i indicates floodplain grid-cells. We applied this equation for each of the 520 basins to calculate human alterations of floodplains between 1992 and 2019 (Fig. 1a). The intermediate steps involved in this calculation are elaborated below. Majority of these intermediate steps are adapted from our previous work on the Mississippi River Basin 31 .
Land use reclassification. To facilitate easy translation and application of our floodplain alteration dataset across interdisciplinary research, education, and decision-making tools, we reclassified the original 37-class CCI land use dataset into a generic 7-class land use dataset. These seven classes included major land use types: (1) open water, (2) developed area, (3) barren land, (4) forest, (5) grassland, (6) agriculture, and (7) wetland (Supplementary Table 1 , 1992-1993, 1992-1994, and finally 1992-2019), the outcome was a new map with two possible attributes: "1" meaning one unique land use or "no change" and "2" meaning two non-unique land uses or "change" between two points in time.
Transition Matrix Analysis. We performed Transition Matrix Analysis 31,73-76 to quantify how the floodplain land use changed from one class to the other(s) between two years. Among the "change" grid-cells where this analysis suggested human alterations (as defined above; more specifically, forest to developed, forest to agriculture, grassland to agriculture, agriculture to developed, and wetland to agriculture), we calculated LU ChangeArea i by multiplying the corresponding total number of grid-cells with the spatial resolution of a single grid-cell (250 × 250 m 2 ).
Characteristic pattern and temporal trends of floodplain alterations. We aggregated the results of transition analyses to form two-dimensional matrixes at continental scales (Supplementary Tables 2-7). We then used these matrixes to identify the characteristic pattern of floodplain alterations in every continent by calculating the relative proportions of alteration types (Fig. 1b). We also calculated the relative proportions of continents to the global sum of each alteration type, thus identifying how the floodplain alterations at different continents contribute to the overall global floodplain alterations ( Supplementary Fig. 1).
Next, we used the 27-year annual floodplain land use time-series (1992-2019) to calculate increases in altered floodplain areas over time in every continent, thus identifying continents that may be needing priority global policy attention for floodplain protection and restoration (Fig. 1c). We also performed a similar temporal analysis at basin scales for six major basins from six different continents, including the Great Lakes Basin in North America, the Amazon River Basin in South America, the Danube River Basin in Europe, the Nile River Basin in Africa, the Yangtze River Basin in Asia, and the Murray River Basin in Australia/Oceania, to provide an explicit understanding of the temporal dynamics of floodplain alterations in those basins ( Fig. 2; Supplementary Figs. 2, 3).

Relative difference between floodplain and non-floodplain alterations.
Our approach to calculate non-floodplain alterations was the same as described in step 1, except we first excluded floodplain extents from the reclassified land use dataset and then applied the subsequent operations on the non-floodplain portion of the basins. We therefore calculated the degree to which floodplain alterations differed from "non-floodplain" landscape alterations between 1992 and 2019 (Fig. 1d).

Data records
The global floodplain alteration dataset is available through the HydroShare open geospatial data sharing platform. Our data record also includes all corresponding input data, intermediate calculations, and supporting information. Tables 1, 2 below summarize the file contents. The entire data record can be downloaded as a single zip file from this web link 32 : https://doi.org/10.4211/hs.cdb5fd97e0644a14b22e58d05299f69b.

Technical Validation
Our input datasets, the GFPLAIN250m floodplain and CCI land use, have been validated in numerous published research 31,33,77-79 , including our previous work on the Mississippi River Basin 31 . We therefore validated our output -the global floodplain alteration dataset -using the global human modification dataset 61,62 as a reference. This is the only dataset, to our knowledge, that is comparable to ours yet applies independent inputs as well as different underlying concepts and goals, to validate human alterations of global floodplains. (2023) 10:499 | https://doi.org/10.1038/s41597-023-02382-x www.nature.com/scientificdata www.nature.com/scientificdata/ The global human modification verification data are a group of indices indicating where natural terrestrial ecosystems modifications occur based on 14 stressors, e.g., urban development, agricultural expansion, transportation network, power lines, and air pollution, among others. The degree of modification by each stressor combines to a composite index ranging from 0 to 1: "1" is the maximum modification. Stressor assessment is based on remote sensing data (e.g., CCI land use 35 We used the 2017 human modification dataset to compare with floodplain alterations in 2017 (estimated with respect to 1992, which falls within our 27-year (1992-2019) annual land use change analysis). Because the two datasets are conceptually different, were developed using different methodology, and targeted different goals and outputs, complex statistical tests comparing the two would not be meaningful. Therefore, our verification focused on parsimonious statistical calculations to justify physical consistency and qualitative visual  Total area (km 2 ) for each of the five alteration types (see Table 2   www.nature.com/scientificdata www.nature.com/scientificdata/ interpretation to understand semantic relatedness 87 between the two datasets. Specifically, we first extracted human modification values for the altered floodplain grid-cells at continental scales. We then calculated the median, 25 th percentile and 75 th percentile of human modification data across those grid-cells (Fig. 4). Results suggest that our altered floodplain grid-cells correspond to considerably high human modification values (except Oceania), thus indicating the physical consistency of our data with an independent reference estimate. Next, we randomly selected six floodplain locations (4,000-30,000 km 2 ) from six continents and visually inspected their one-on-one resemblance. The high visual resemblance between the two independent datasets across randomly selected locations (Fig. 4) indicates the robust spatial reasoning of our methodology.

Usage Notes
The multi-faceted information provided by our Global Floodplain Alteration dataset, e.g., forest to agriculture constituting 83% of floodplain alterations in South America (Fig. 1b) or North America being responsible for 34% of total wetland to agriculture transitions along the global floodplains ( Supplementary Fig. 1), are readily useful for making robust policy decisions. However, here we present an additional example demonstrating how some of these information can be reused and integrated into broader research and policy decision workflows. Specifically, we calculated a historical floodplain alteration rate (1965-1992) based on a coarse-resolution (1-km) global land use dataset that hindcasts back to 1960s 88 . We then compared this historical floodplain alteration rate (1965-1992) with a contemporary floodplain alteration rate (1992-2009) based on our dataset at finer resolution (0.25-km). Such a comparison of floodplain alteration rates during the past 27 years (1965-1992) and recent 27 years (1992-2019), at global and continental scales, allowed to examine whether and to what extent floodplain alterations have accelerated (or slowed) in the recent years (Fig. 5). This use case did not reveal any specific pattern, meaning a region with high antecedent floodplain alteration rate before 1992 (the baseline year of our analysis) may not continue to show steeper rates of change in the subsequent years (e.g., North America). This example highlights the importance of long-term, temporally continuous analysis of floodplain alteration in addition to mapping a snapshot of floodplain connectivity/integrity 89 .
To enable instantaneous and effective integration of our floodplain alteration dataset into other scientific workflows such as the one exemplified above, we made our dataset, relevant inputs, and underlying methodology Findable, Accessible, Interoperable, and Reusable (FAIR) 90 . We ensure these FAIR properties by developing three semi-automatic programming tools: (1) Floodplain Mapping Tool, (2) Land Use Change Tool, and (3) Human Alteration Tool. These tools were designed with different levels of complexity to support both the general users and advanced users.
By gathering user-inputs on geographical extent, each of these tools automatically conducts a set of computation steps in Python programming language and that entirely in Google's web-based high-performance computing platform called Google Colaboratory. Users will be able to run these tools on an interactive web browser without having to write any new code, deal with GIS software, and manually download and process input datasets, and do computations in personal computers. More importantly, these tools are scalable, meaning advanced users can modify these tools and/or make them interoperate directly with other existing tools, thereby promoting the Open Science principles for cross-disciplinary floodplain research, management, and conservation-restoration decisions.
Links to these tools are provided in the Code Availability section. Users, depending on their objectives, can run these tools in a sequence and answer the following questions.
How can we identify floodplains? The Floodplain Mapping Tool will let general users instantaneously access GFPLAIN250m floodplain database and create interactive floodplain maps directly on a web browser. We supplemented this tool by preparing a tutorial via an online data-driven geoscience education platform: https:// serc.carleton.edu/hydromodules/steps/246320.html. To further assist the general users, we developed an audiovisual tutorial with step-by-step instructions: https://youtu.be/TgMbkJdALig. www.nature.com/scientificdata www.nature.com/scientificdata/ Where in a floodplain land use changed over time? The Land Use Change Tool will let general users perform the basic computation steps, including land use reclassification, extraction of land use within floodplain extents, and finally detection of floodplain land use change (see Methods section for detail descriptions). We developed part of this tool during our previous work on the Mississippi River Basin 31 , and subsequently enhanced the tool by making it globally applicable. We supplemented this tool by preparing a tutorial via an online data-driven geoscience education platform: https://serc.carleton.edu/hydromodules/steps/241489.html. The corresponding audiovisual demonstration is available at: https://youtu.be/wH0gif_y15A.

How to quantify human alterations of floodplains from land use change data? The Human
Alteration Tool is geared mainly towards the advanced users. The tool can perform transition matrix analysis using land use change data (i.e., how one land use class changed into another class between two years), identify the transition types posing negative impacts on floodplain functions (i.e., forest to developed, forest to agriculture, grassland to agriculture, agriculture to developed, and wetland to agriculture), and subsequently calculate degree of human alterations as percent floodplain extent (see Methods section for detail descriptions).

code availability
The global floodplain alteration dataset was derived entirely through ArcGIS 10.5 and ENVI 5.1 geospatial analysis platforms. To assist in reuse and application of the dataset, we developed additional Python codes aggregated as three web-based tools: Floodplain